AITopics | js divergence

Collaborating Authors

js divergence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking Knowledge Distillation: A Data Dependent Regulariser With a Negative Asymmetric Payoff

Mason-Williams, Israel, Mason-Williams, Gabryel, Yannakoudakis, Helen

arXiv.org Artificial IntelligenceOct-15-2025

Knowledge distillation is often considered a compression mechanism when judged on the resulting student's accuracy and loss, yet its functional impact is poorly understood. In this work, we quantify the compression capacity of knowledge distillation and the resulting knowledge transfer from a functional perspective, decoupling compression from architectural reduction, which provides an improved understanding of knowledge distillation. We employ hypothesis testing, controls, and random control distillation to understand knowledge transfer mechanisms across data modalities. To rigorously test the breadth and limits of our analyses, we explore multiple distillation variants and analyse distillation scaling laws across model sizes. Our findings demonstrate that, while there is statistically significant knowledge transfer in some modalities and architectures, the extent of this transfer is less pronounced than anticipated, even under conditions designed to maximise knowledge sharing. Notably, in cases of significant knowledge transfer, we identify a consistent and severe asymmetric transfer of negative knowledge to the student, raising safety concerns in knowledge distillation applications. Across 12 experimental setups, 9 architectures, and 7 datasets, our findings show that knowledge distillation functions less as a compression mechanism and more as a data-dependent regulariser with a negative asymmetric payoff.

knowledge management, machine learning, random control distillation siddo 0, (18 more...)

arXiv.org Artificial Intelligence

2510.12615

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Knowledge Management > Knowledge Engineering (0.95)

Add feedback

Constraining Variational Inference with Geometric Jensen-Shannon Divergence

Neural Information Processing SystemsOct-3-2025, 07:28:28 GMT

In order to properly capitalise on the advantages of each divergence, it is also desirable that the meaning of scaling factors remains clear when combining multiple divergence terms.

divergence, interpolation, reverse kl, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GEPD:GAN-Enhanced Generalizable Model for EEG-Based Detection of Parkinson's Disease

Zhang, Qian, Zhang, Ruilin, Zhu, Biaokai, Han, Xun, Xiao, Jun, Liu, Yifan, Wang, Zhe

arXiv.org Artificial IntelligenceAug-21-2025

Electroencephalography has been established as an effective method for detecting Parkinson's disease, typically diagnosed early. Current Parkinson's disease detection methods have shown significant success within individual datasets, however, the variability in detection methods across different EEG datasets and the small size of each dataset pose challenges for training a generalizable model for cross-dataset scenarios. To address these issues, this paper proposes a GAN-enhanced generalizable model, named GEPD, specifically for EEG-based cross-dataset classification of Parkinson's disease. First, we design a generative network that creates fusion EEG data by controlling the distribution similarity between generated data and real data. In addition, an EEG signal quality assessment model is designed to ensure the quality of generated data great. Second, we design a classification network that utilizes a combination of multiple convolutional neural networks to effectively capture the time-frequency characteristics of EEG signals, while maintaining a generalizable structure and ensuring easy convergence. This work is dedicated to utilizing intelligent methods to study pathological manifestations, aiming to facilitate the diagnosis and monitoring of neurological diseases. The evaluation results demonstrate that our model performs comparably to state-of-the-art models in cross-dataset settings, achieving an accuracy of 84.3% and an F1-score of 84.0%, showcasing the generalizability of the proposed model.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.14074

Country: Asia > China (0.68)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization

Nguyen, Dai Hai, Mamitsuka, Hiroshi, Nakamura, Atsuyoshi

arXiv.org Machine LearningMay-27-2025

We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2505.18765

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Hokkaidō (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Is analogy enough to draw novel adjective-noun inferences?

Ross, Hayley, Davidson, Kathryn, Kim, Najoung

arXiv.org Artificial IntelligenceMar-31-2025

Recent work (Ross et al., 2025, 2024) has argued that the ability of humans and LLMs respectively to generalize to novel adjective-noun combinations shows that they each have access to a compositional mechanism to determine the phrase's meaning and derive inferences. We study whether these inferences can instead be derived by analogy to known inferences, without need for composition. We investigate this by (1) building a model of analogical reasoning using similarity over lexical items, and (2) asking human participants to reason by analogy. While we find that this strategy works well for a large proportion of the dataset of Ross et al. (2025), there are novel combinations for which both humans and LLMs derive convergent inferences but which are not well handled by analogy. We thus conclude that the mechanism humans and LLMs use to generalize in these cases cannot be fully reduced to analogy, and likely involves composition.

bigram, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.24293

Country:

North America > United States > New Mexico > Doña Ana County > Las Cruces (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

Reviews: PacGAN: The power of two samples in generative adversarial networks

Neural Information Processing SystemsOct-7-2024, 07:02:37 GMT

Summary: While Generative Adversarial Networks (GANs) have become the desired choice for generative tasks in the community, they also suffer from a nagging issue of mode collapse (cf. The current literature also has some empirical ways to handle this issue (cf. They present the technique of packing, in which the discriminator now uses multiple samples in its task. Detailed Comments: Clarity: The paper is very well written, both rigor and intuitive expositions are presented. Originality: As explained above in summary, perhaps this is the first time a framework of mode collapse is constructed and its theoretical underpinnings are discussed.

generative adversarial network, js divergence, pacgan, (7 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Khanal, Bishwash, Capone, Jeffery M.

arXiv.org Artificial IntelligenceSep-17-2024

Large language models (LLMs) offer powerful capabilities but incur substantial computational costs, driving the need for efficient compression techniques. This study evaluates the impact of popular compression methods - Magnitude Pruning, SparseGPT, and Wanda - on the LLaMA-2-7B model, focusing on the trade-offs between model size reduction, downstream task performance, and the role of calibration data. Our findings reveal that while SparseGPT and Wanda preserve perplexity even at 50% sparsity, they suffer significant degradation on downstream tasks, highlighting the inadequacy of perplexity as the sole evaluation metric. To address this, we introduce Jensen-Shannon (JS) Divergence as a more comprehensive metric that captures nuanced changes in model behavior post-compression. We further demonstrate that task-specific calibration data significantly enhances the downstream performance of compressed models compared to general calibration data. This research underscores the necessity for diverse evaluation metrics and careful calibration data selection to fully understand the complexities of LLM compression and its implications for practical applications.

calibration data, compression, divergence, (14 more...)

arXiv.org Artificial Intelligence

2409.11233

Country:

North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback

The Significance of Latent Data Divergence in Predicting System Degradation

Fernandes, Miguel, Silva, Catarina, Cardoso, Alberto, Ribeiro, Bernardete

arXiv.org Artificial IntelligenceJun-13-2024

Condition-Based Maintenance is pivotal in enabling the early detection of potential failures in engineering systems, where precise prediction of the Remaining Useful Life is essential for effective maintenance and operation. However, a predominant focus in the field centers on predicting the Remaining Useful Life using unprocessed or minimally processed data, frequently neglecting the intricate dynamics inherent in the dataset. In this work we introduce a novel methodology grounded in the analysis of statistical similarity within latent data from system components. Leveraging a specifically designed architecture based on a Vector Quantized Variational Autoencoder, we create a sequence of discrete vectors which is used to estimate system-specific priors. We infer the similarity between systems by evaluating the divergence of these priors, offering a nuanced understanding of individual system behaviors. The efficacy of our approach is demonstrated through experiments on the NASA commercial modular aero-propulsion system simulation (C-MAPSS) dataset. Our validation not only underscores the potential of our method in advancing the study of latent statistical divergence but also demonstrates its superiority over existing techniques.

dataset, divergence, prediction, (14 more...)

arXiv.org Artificial Intelligence

2406.12914

Country: Europe > Portugal > Coimbra > Coimbra (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Modeling & Simulation (0.88)

Add feedback

Synthetic Tabular Data Validation: A Divergence-Based Approach

Apellániz, Patricia A., Jiménez, Ana, Galende, Borja Arroyo, Parras, Juan, Zazo, Santiago

arXiv.org Artificial IntelligenceMay-13-2024

The ever-increasing use of generative models in various fields where tabular data is used highlights the need for robust and standardized validation metrics to assess the similarity between real and synthetic data. Current methods lack a unified framework and rely on diverse and often inconclusive statistical measures. Divergences, which quantify discrepancies between data distributions, offer a promising avenue for validation. However, traditional approaches calculate divergences independently for each feature due to the complexity of joint distribution modeling. This paper addresses this challenge by proposing a novel approach that uses divergence estimation to overcome the limitations of marginal comparisons. Our core contribution lies in applying a divergence estimator to build a validation metric considering the joint distribution of real and synthetic data. We leverage a probabilistic classifier to approximate the density ratio between datasets, allowing the capture of complex relationships. We specifically calculate two divergences: the well-known Kullback-Leibler (KL) divergence and the Jensen-Shannon (JS) divergence. KL divergence offers an established use in the field, while JS divergence is symmetric and bounded, providing a reliable metric. The efficacy of this approach is demonstrated through a series of experiments with varying distribution complexities. The initial phase involves comparing estimated divergences with analytical solutions for simple distributions, setting a benchmark for accuracy. Finally, we validate our method on a real-world dataset and its corresponding synthetic counterpart, showcasing its effectiveness in practical applications. This research offers a significant contribution with applicability beyond tabular data and the potential to improve synthetic data validation in various fields.

divergence, synthetic data, synthetic tabular data validation, (12 more...)

arXiv.org Artificial Intelligence

2405.07822

Country:

Europe > Spain > Galicia > Madrid (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Asia > Nepal (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Data Science > Data Quality (0.72)

Add feedback

Machine unlearning through fine-grained model parameters perturbation

Zuo, Zhiwei, Tang, Zhuo, Li, Kenli, Datta, Anwitaman

arXiv.org Artificial IntelligenceJan-9-2024

Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable. In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning.

accuracy, perturbation, random-k, (14 more...)

arXiv.org Artificial Intelligence

2401.04385

Country:

North America > United States > California (0.04)
Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.54)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback